-
Notifications
You must be signed in to change notification settings - Fork 3.9k
GH-48202: [C++][Parquet] Fix encoder & decoder logic to enable Parque… #48203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
|
4efffd9 to
8c99313
Compare
|
@Vishwanatha-HD Hey! Really appreciate you taking on the encoder/decoder paths for s390x — these two files are where a lot of the subtle BE issues first show up. One thing I ran into on real s390x hardware is that Arrow’s array buffers already store their scalars in canonical little-endian format. Because of that, per-value swapping inside the Plain/Arrow fast paths can sometimes lead to an unintended double-swap, especially when mixing Arrow-originated inputs with non-Arrow callers (e.g., DeltaBitPack or ByteStreamSplit feeding into the same decode path). A couple of spots I’m curious about here: • PlainDecoder → Arrow fast path • PlainEncoder (primitive path) • ByteStreamSplit None of this blocks the PR — just sharing things I hit in BE testing across the encode/decode → stats → page-index chain. |
@k8ika0s.. Thanks for your review comments.. While agree with you that from the structural point of view, things could be log better.. But please note that, I have tested with my changes on the s390x systems and also on Openshift AI workloads.. It works properly.. Hence there is no concern with these changes.. |
…Parquet DB support on s390x
8c99313 to
18d96e3
Compare
Vishwanatha-HD
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have addressed all the review comments. Thanks
…t DB support on s390x
Rationale for this change
This PR is intended to enable Parquet DB support on Big-endian (s390x) systems. The fix in this PR fixes the encoder & decoder logic. Encoders and Decoders are the main part of most of the parquet & arrow-parquet testcases and needs fix for variaous encoding & decoding types.
What changes are included in this PR?
The fix includes changes to following files:
cpp/src/parquet/decoder.cc
cpp/src/parquet/encoder.cc
Are these changes tested?
Yes. The changes are tested on s390x arch to make sure things are working fine. The fix is also tested on x86 arch, to make sure there is no new regression introduced.
Are there any user-facing changes?
No
GitHub main Issue link: #48151